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Abstract 

The evolution of entropy is derived with respect to dynamical systems. For a stochastic system, 
its relative entropy D evolves in accordance with the second law of thermodynamics; its absolute 
entropy H may also be so, provided that the stochastic perturbation is additive and the flow of 
the vector field is nondivergent. For a deterministic system, dH/dt is equal to the mathematical 
expectation of the divergence of the flow (a result obtained before), and, remarkably, dD/dt = 0. 
That is to say, relative entropy is always conserved. So, for a nonlinear system, though the 
trajectories of the state variables, say x, may appear chaotic in the phase space, say f2, those of 
the density function p(x) in the new "phase space" L 1 (Q) are not; the corresponding Lyapunov 
exponent is always zero. This result is expected to have important implications for the ensemble 
predictions in many applied fields, and may help to analyze chaotic data sets. 
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In thermodynamics, it is well known that entropy production is closely related to phase 
space contraction In information theory, similar relation has also been established; par- 
ticularly, in the context of a deterministic system, it has been shown that the time evolution 
of absolute entropy, namely Shannon entropy, is precisely equal to the mathematical expec- 
tation of the divergence of the flow (cf . Eq. (JSJ) below) [2j . This elegant relation has since 
led to the establishment of a rigorous formalism of information flow (or information transfer 
as referred to in the literature), a fundamental notion in general phsyics which has broad 
applications in a variety of disciplines (2|(3|. 

However, it has also been well known that absolute entropy, denoted H henceforth, need 
not be consistent with the second law of thermodynamics [4|] which states that the entropy 
of an isolated system cannot decrease as time goes on. Although the connection between 
information entropy and thermodynamic entropy is still on debate jH], it would be better to 
have the former put on a physical footing. In this case, naturally one would ask under what 
circumstances the consistency may be established. This forms one of the questions we want 
to address in this study. 

On the other hand, relative entropy (hereafter D) does comply with the second law of 
thermodynamics (3). This important property, among others, makes D an ideal physical 
measure in many contexts, as recognized by Kleeman (2002), and has let to a resurgence 
of interest in it during the past decade in different applications 0]. Considering that H has 
a concise evolutionary law, one naturally wonders how D evolves. In Q, this is discussed 
in the framework of a Markov chain, and obtained is an inequality like the afore-mentioned 
second law. But somehow the result is too generic; in the context of a dynamical system, 
it could have a more specific and, hopefully, more definite statement. Indeed, as we will 
see soon, relative entropy is actually conserved with deterministic systems. This remarkable 
result, together with others, are what we are about to derive in the following. 

First consider an n-dimensional deterministic system with randomness limited within 
initial conditions: 

|=Ffe«), (1) 

where x = (xi,X2, ...,x n ) T G M n are the state variables. Associated with x there is a joint 
probability density function, p = p(t;x) = p{t;xi,X2, ...,x n ), and hence an absolute entropy 



H = - plogprfx, (2) 

and a relative entropy 

D = p log — dx. — —H — p log q dx, (3) 

with some reference density q of x. We are interested in how H and D evolve with respect to 
(OQ). For this purpose, assume that p, q, and their derviatives are all compactly supported; 
further assume enough regularity for p, q, D, and H. The mathematics involved here is 
neglected for a broad readership; those who feel interested may consult j3] for a detailed 
discussion. Note the choosing of the reference density q is slightly different from what 
people are using these days[6| in applications, particularly in predictability studies, who 
usually choose it to be some constant distribution (initial distribution, for example). We 
require that q also evolve, and that it follow the same evolution as p does. Only in this way 
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can we have the neat result onD, as will be derived soon. (Perhaps this is the reason why 
the following result was not seen before, as the past studies have focused on the choice of a 
constant q.) 

Corresponding to (pQ) there is a Liouville equation 



| + V,,F) 







governing the evolution of the joint density p. Multiplying 
over M ra , Liang and Kleeman obtain that [2] 



(4) 

by — (1 + logp) and integrating 




(5) 

where the operator E stands for mathematical expectation (refer to |2J for the derivation). 
In arriving at this formula, originally it is assumed that extreme events have a probability 
of zero, which corresponds to our above compact support assumption. This makes sense 
in practice and has been justified in j2|, but even this assumption may be relaxed, and the 
same formula follows jjjj]. 

For the relative entropy D, differentiation of ((3]) with respect to t gives 



dD 



dH 

~dt 
dH 



dp 
dt 



logg dx. 



p dq 
q dt 



+ (I) + (1). 



The integrals are all understood to be over M. n , and this simplification will be used hereafter, 
unless otherwise indicated. The two shorthands are: 



(I) 



[V ■ (pF)logg] rfx 

-E (F ■ V log q) , 
<91ogg s 



V ■ (pFlogg) - / pF ■ V(logg) 
n Jn 



-E 



dt 



So 



dD 
~dt 



dH 

~dt 
dH 

~dt 



Al'^ + F-Vlogg 



E 



dt 



(6) 



Recall that q is also a joint density of x, so its evolution must follow the same Liouville 
equation, i.e., 

^ + F • Vq = -qV ■ F. 

dt 

The relative entropy evolution ([6]) thus becomes 



dD dH 

_ = __ + £(V .F) 



(7) 



Substitution of ([5]) for ^ gives 
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(8) 



That is to say, relative entropy is conserved. 

The above results are now generalized to systems with stochasticity included. Let w 



[w 1 ,w 2 , 



,w. 



be an array of n standard Wiener processes, and B a matrix which may 



have dependency on both x and t. The system we are to consider has the form: 

= F(x, t)dt + J|(x, t)dw. 
Correspondingly the density evolves according to a Fokker-Planck equation 



dp 
di 



-V-(pF) + -VV:(pg), 



(9) 



(10) 



where G = B B T is a nonnegatively definite matrix. The double dot product here is defined 
such that, for column vectors a, b, c, and d, 

(ab) : (cd) = (a-c)(b -d). 

A dyad ab in matrix notation is identified with ab T . 

Multiplication of ( TTU1) by —(1 + logp), followed by an integration over the entire sample 
space M. n , yields an evolution of the absolute entropy 



dH 
~dt 



E (V • F) 



4 + logp)VV : (pG) dx. 



(11) 



In arriving at the first term on the right hand, the previous result (i.e., (|5])) with the Liouville 
equation has been applied. For the second term, since J VV : (pG)dx = by the compact 
support assumption, it results in 

- l - J logpVV : (pGJ rfx = l - J V • (pGJ • Vlogp dx 

~E(G: VVlogp), 



} ; pg: VVlogp rfx 
where integration by parts has been used. So 



^L = E{V .F)- 1 -E (G: VVlogp). 



(12) 



One of our purposes for this study is to see whether the evolution of H can be reconciled 
to comply with the second law of thermodynamics, by taking away the effect of phase space 
volume change, i.e., E (V • F) in this formula. That is to say, we would like to see whether 
E (G : VV log p) is non-positive. Unfortunately, this need not be true in general. However, 
if G is constant in x or, in other words, if the noise is additive, then G can be taken out of 
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the expectation. Integrating by parts, 



E(G : VVlogp) = g : J p( VVlogp) dx 



G : 



VpV log p dx 



— G : / pV log p V log p dx 

-g : E(VlogpVlogp) 
-£(Vlogp-G- Vlogp). 



Because G = B B T is nonnegatively definite, V log p ■ G • V log p > 0, hence 
d FT 1 

— — E(V ■ F) = -£( V log p • g ■ V log p) > 0. 



(13) 



That is to say, in this case, systems without phase volume expansion/contraction in the 
deterministic limit (such as the Hamiltonian system), absolute entropy is in accordance 
with the second thermodynamic law. 

It is interesting to note that the above formula (fTB"l) may be linked to Fisher information 
if the parameters, say p^, of the distribution are bound to the state variables in a form of 
translation such as that in a Gaussian process. In this case, one can replace the partial 
derivatives with respect to Xi by that with respect to pj. And, accordingly, 

E (VlogpV logp) = I, 



where I = (iy) is the Fisher information matrix. So 

f = *(V. S + Ifi: = L 



(14) 



Next look at the relative entropy ([3]). For the reference density q, it is also governed by 
the Fokker-Planck equation, which reads 



^ = -V-(gF) + ivV:(gGJ. 



(15) 



Substituting ffTQ]) and f JT5|) into the identity 



<9(plogg) dp pdq 
^t— = Tt l0gq+ - q Yt 



for j£ and ||, and then integrating over R n , we get 
p log q dx.= — 



at 
d 

dt Jn 



9p, , pdq 
-logg+--|ox 



logg 



-V • (pF) + -VV : (pg) 



r/x / '- 

Q 



-V • (<?F) + -VV : (gg) 



c/x 



log gV • (pF) + - ■ (gF) 



P, 



loggVV : (pg) + ^VV : (gg) 



<ix 
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J [loggV • (pF) + Vlogg • pF + pV • F] dx - i / 



loggVV : (pG) + ^VV : (gG) 
— 9 — . 



E(V-F)-- 



loggVV : (pG)dx + / -VV : (gG)dx 

— y q — 



Subtracting ffTTj) from above gives the time evolution of the relative entropy: 



dD 



P 



q 



log Z- VV : (pG) + VV : (pG) - - VV : (gG) 



dx. 



dx 
(16) 

(17) 



Integrating by parts, and using the compact support assumption, this becomes 



dD 
~dt 



( Vp ■ g + pV • GJ 
£ • (pVg - gVp) dx 

^(;). fl .vg)* 



dx 

■ ( Vg ■ G + gV • G) 



dx 



2 

i B 

2 

l -E 
2 

2 



•G- V 



P 



-V - • G • -V - 



q 



V log - • G • V log ~ 



q 



±E 
2 



qj ~ \ pj\ 

V ( log ~ ) • G • V flog - 



q 



q 



(18) 



Because of the nonnegative definiteness of G = B B T , the right hand side is always smaller 
than or equal to zero, in accordance with the thermodynamic entropy. (Notice the negative 
sign in the definition of D\ that is to say, increase in H corresponds to decrease in D.) 

We have studied the evoluationary laws for absolute entropy H and relative entropy D 
with respect to dynamical systems. For easy reference, the derived formulas are wrapped 
up here. If the system of concern is deterministic, i.e., in the form of (pQ), then 



§ = *(V.F) 



(ED 



dD 
~dt 



0. 



If the system has stochasticity included, as that in (Q, then 



-E (G: VVlog£ p ), 



dD 
~dt 



1 



dUD 



where £ p = logp, £ p / q = logp/g, and G = B B T . Among the four formuas, (jSJ) was known 
before, the rest were obtained in this letter. From them we see that generally absolute 
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entropy does not comply with the second law of thermodynamics, unless the flow of the 
deterministic vector field is nondivergent (as that in a Hamiltonian system) and the noisy 
perturbation is additive. The relative entropy, in contrast, proves to be non- increasing all the 
time, in accordance with the second law. The dissipative mechanism has a form remniscient 
of the Fisher information. 

Of particular interest among the above formulas are those for deterministic systems. They 
have important implications from both theoretical and applied points of view. For example, 
drifter releasing as one of the oldest methods of studying ocean circulation has built up 
for oceanic scientists a huge database; but the drifter trajectories are usually chaotic and 
are difficult to analyze. Here (jHJ) and (jSJ) may come to help by offering two constraints. 
The former tells that the relative entropy is conserved. For the latter, the sea water is 
incompressible and hence the oceanic flow is divergence free. So by (jSJ) the absolute entropy 
of these trajectories is also a constant. Equally this applies to the study of atmospheric 
pollutant dispersion. Though the air is compressible, but in an isobaric frame it is not, and 
hence atmospheric flows are also divergence free. So in isobaric coordinates the pollutant 
trajectories must also conserve their absolute entropy, as well as the relative entropy. 

Relative entropy has an interpretation that it measures the distance between two functions 
p and q in the function space L^R" - ) (i.e., integrable functions) 0] , although it does not meet 
all the axioms for a metric. This interpretation makes the relative entropy conservation 
law, namely jSJ), theoretically very interesting. To see this, examine a nonlinear system 
that is sensitive to initial perturbations. The sensitivity is quantitatively characterized by 
the maximal Lyapunov exponent (MLE), which measures the exponential growth of the 
separation of two trajectories closely placed in the beginning Q. More specifically, if the 
Lyapunov exponent is A, and the distance between two trajectories is 5(t), then |Sy 

e xt . Usually a system is considered as chaotic if the MLE is positive; corespondingly the 
predictability is quickly lost. Now, the relative entropy conservation law tells that, if instead 
of studying the evolution of the state variables x £ ~R n , we study the evolution of their joint 
density p G L 1 (R n ), the "trajectories" in the new "phase space" L 1 (IR n ) will have equal 
separations all the time. That is to say, although the trajectories of x may be chaotic, the 
"trajectories" of p(x) are not, and the corresponding Lyapunov exponent A will always be 
zero. 

The above observation is expected to have important implications in the the active re- 
search field, ensemble prediction. Realizing the limited predictability of nonlinear dynamical 
systems, during the past decades there has been a surge of interest in ensemble prediction, 
for instance, ensemble weather prediction [8[. The implication is two-fold. Firstly, the law 
rationalizes the prediction technique, in that it assures the insensitivity of the distribution 
to initial conditions. In this sense, the conservation law may be taken as the theoreti- 
cal basis of ensemble prediction. Seondly, the law imposes a constaint on the numerical 
schemes designed for prediction. We know, in approximating the differential operators in a 
(deterministic) system for numerical computation, the underlying physics is, more or less, 
changed. For instance, artifical damping at each step may be used to ensure numerical 
stability; stochasticity may be deliberately introduced to parameterize the processes that 
cannot be resolved by the model grids; the ensemble size may be too small to cover the 
sample space, and so forth. For high dimensional problems such as weather forecast, the 
latter is particularly severe, as the integration is very expensive. All these may lead to a 
non-conservative relative entropy, and hence the resulting prediction may not be able to 
reflect the real statistical physics underyling the system. How to design a relative entropy 
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conservative scheme is, therefore, of interest for ensemble predictions with high dimensional 
systems. We leave this to future studies. 
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